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ABSTRACT 

We introduce a stable, well tested Python implementation of the affine- 
invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed 
by Goodman & Weare (2010). The code is open source and has already been 
used in several published projects in the astrophysics literature. The algorithm 
behind emcee has several advantages over traditional MCMC sampling methods 
and it has excellent performance as measured by the autocorrelation time (or 
function calls per independent sample). One major advantage of the algorithm 
j is that it requires hand-tuning of only 1 or 2 parameters compared to ~ A^^ 

for a traditional algorithm in an A^- dimensional parameter space. In this docu- 
ment, we describe the algorithm and the details of our implementation and API. 
Exploiting the parallelism of the ensemble method, emcee permits any user to 
take advantage of multiple CPU cores without extra effort. The code is available 
online at http://danfni.ca/emcee under the GNU General Public License v2. 

Subject headings: methods: data analysis — methods: numerical — methods: 
statistical 
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Note: If you want to get started immediately with the emcee package, start at Appendix A 
on page 14- If you are sampling with emcee and having low- acceptance-rate or other issues, 
there is some advice in Section 4 starting on page 9. 



Probabilistic data analysis — including Bayesian inference — has transformed scientific 
research in the past decade. Many of the most significant gains have come from numerical 
methods for approximate inference, especially Markov chain Monte Carlo (MCMC). For 
example, many problems in cosmology and astrophysics^ have directly benefited from MCMC 
because the models are often expensive to compute, there are many free parameters, and 
the observations are usually low in signal-to-noise. 

Probabilistic data analysis procedures involve computing and using either the posterior 
probability density function (PDF) for the parameters of the model or the likelihood function. 
In some cases it is sufficient to find the maximum of one of these, but it is often necessary to 
understand the posterior PDF in detail. MCMC methods are designed to sample from — and 
thereby provide sampling approximations to — the posterior PDF efficiently even in parameter 
spaces with large numbers of dimensions. This has proven useful in too many research 
applications to list here but the results from the NASA Wilkinson Microwave Anisotropy 
Probe (WMAP) cosmology mission provide a dramatic example (for example, Dunkley et al. 



Arguably the most important advantage of Bayesian data analysis is that it is possi- 
ble to marginalize over nuisance parameters. A nuisance parameter is one that is required 
in order to model the process that generates the data, but is otherwise of little interest. 
Marginalization is the process of integrating over all possible values of the parameter and 
hence propagating the effects of uncertainty about its value into the final result. Often we 
wish to marginalize over all nuisance parameters in a model. The exact result of marginal- 
ization is the marginalized probability function p{Q\D) of the set (list or vector) of model 
parameters B given the set of observations D 



where a is the set (list or vector) of nuisance parameters. Because the nuisance parameter 



^The methods and discussion in this document have general apphcabihty, but we will mostly present 
examples from astrophysics and cosmology, the fields in which we have most experience 
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set ot can be very large, this integral is often extremely daunting. However, a MCMC- 
generated sampling of values (Gt,at) of the model and nuisance parameters from the joint 
distribution p(0, a|-D) automatically provides a sampling of values 6t from the marginalized 
PDF p(e|D). 

In addition to the problem of marginalization, in many problems of interest the likelihood 
or the prior is the result of an expensive simulation or computation. In this regime, MCMC 
sampling is very valuable, but it is even more valuable if the MCMC algorithm is efficient, 
in the sense that it does not require many function evaluations to generate a statistically 
independent sample from the posterior PDF. The methods presented here are designed for 
efficiency. 

Most uses of MCMC in the astrophysics literature are based on slight modifications to 
the Metropolis-Hastings (M-H) method (introduced below in Section 2). Each step in a M-H 
chain is proposed using a compact proposal distribution centered on the current position of 
the chain (normally a multivariate Gaussian or something similar). Since each term in the 
covariance matrix of this proposal distribution is an unspecified parameter, this method has 
[A^ + l]/2 tuning parameters (where is the dimension of the parameter space). To make 
matters worse, the performance of this sampler is very sensitive to these tuning parameters 
and there is no fool-proof method for choosing the values correctly. As a result, many 
heuristic methods have been developed to attempt to determine the optimal parameters 
in a data-driven way (for example, Gregory 2005; Dunkley etal. 2005; Widrow etal. 2008). 
Unfortunately, these methods all require a lengthy "burn-in" phase where shorter Markov 
chains are sampled and the results are used to tune the hyperparameters. This extra cost is 
unacceptable when the likelihood calls are computationally expensive. 

The problem with traditional sampling methods can be visualized by looking at the 
simple but highly anisotropic density 



which would be considered difficult (in the small-e regime) for standard MCMC algorithms. 
In principle, it is possible to tune the hyperparameters of a M-H sampler to make this 
sampling converge quickly, but if the dimension is large and calculating the density is com- 
putationally expensive the tuning procedure becomes intractable. Also, since the number of 
parameters scales as ~ A^^, this problem gets much worse in higher dimensions. Equation (2) 
can, however, be transformed into the much easier problem of sampling an isotropic density 
by an affine transformation of the form 




(2) 



y2 = Xi+ X2 



(3) 



yi = 
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This motivates affine invariance: an algorithm that is affine invariant performs equally 
well under all linear transformations; it will therefore be insensitive to covariances among 
parameters. 

Extending earlier work by Christen (2007), Goodman & Weare (2010, hereafter GWIO) 
proposed an affine invariant sampling algorithm (Section 2) with only two hyperparameters to 
be tuned for performance. Hou etal. (2011) were the first group to implement this algorithm 
in astrophysics. The implementation presented here is an independent effort that has already 
proved effective in several projects (Lang & Hogg 2011; Bovy etal. 2011; Dorman etal. 2012, 
Foreman-Mackey & Widrow 2012, in prep.). In what follows, we summarize the algorithm 
from GWIO and the implementation decisions made in emcee. We also describe the small 
changes that must be made to the algorithm to parallelize it. Finally, in the Appendices, we 
outline the installation, usage and troubleshooting of the package. 



A complete discussion of MCMC methods is beyond the scope of this document. In- 
stead, the interested reader is directed to a classic reference like MacKay (2003) and we will 
summarize some key concepts below. 

The general goal of MCMC algorithms is to draw M samples {6i} from the posterior 
probability density 



where the prior distribution p(0, a) and the likelihood function p(D|0, a) can be relatively 
easily (but not necessarily quickly) computed for any particular value of (6j,Q;j). The nor- 
malization Z = p{D) is independent of and a once we have chosen the form of the 
generative model. This means that it is possible to sample from p(0, C(\D) without comput- 
ing Z — unless one would like to compare the validity of two different generative models. 
This is important because Z is generally very expensive to compute. 

Once the samples produced by MCMC are available, the marginalized constraints on B 
can be approximated by the histogram of the samples projected into the parameter subspace 
spanned by 0. In particular, this implies that the expectation value of a function of the 
model parameters /(0) is 



Generating the samples 0, is a non-trivial process unless p{Q, a, D) is a very specific analytic 



2. The Algorithm 




(4) 




(5) 



5 



distribution (for example, a Gaussian). MCMC is a procedure for generating a random walk 
in the parameter space that, over time, draws a representative set of samples from the 
distribution. Each point in a Markov chain X{ti) = [0j, a^] depends only on the position of 
the previous step 

The Metropolis-Hastings (M— H) Algorithm The simplest and most commonly used 
MCMC algorithm is the M-H method (Algorithm 1; MacKay 2003; Gregory 2005; Press etal. 
2007; Hogg, Bovy & Lang 2010). The iterative procedure is as follows: (1) given a position 
X{t) sample a proposal position Y from the transition distribution Q{Y; X{t)), (2) accept 
this proposal with probability 



The transition distribution Q{Y] X{t)) is an easy-to-sample probability distribution for the 
proposal Y given a position X{t). A common parameterization of Q{Y] X[t)) is a multivari- 
ate Gaussian distribution centered on X{t) with a general covariance tensor that has been 
tuned for performance. It is worth emphasizing that if this step is accepted X{t + 1) = Y; 
Otherwise, the new position is set to the previous one X{t + 1) = X{t) (in other words, the 
position X{t) is repeated in the chain). 

The M-H algorithm converges (as t — )■ oo) to a stationary set of samples from the 
distribution but there are many algorithms with faster convergence and varying levels of 
implementation difficulty. Faster convergence is preferred because of the reduction of com- 
putational cost due to the smaller number of likelihood computations necessary to obtain the 
equivalent level of accuracy. The inverse convergence rate can be measured by the autocor- 
relation function and more specifically, the integrated autocorrelation time (see Section 3). 
This quantity is an estimate of the number of steps needed in the chain in order to draw inde- 
pendent samples from the target density. A more efficient chain has a shorter autocorrelation 
time. 

The stretch move GWIO proposed an affine-invariant ensemble sampling algorithm in- 
formally called the "stretch move." This algorithm significantly outperforms standard M-H 
methods producing independent samples with a much shorter autocorrelation time (see Sec- 
tion 3 for a discussion of the autocorrelation time). For completeness and for clarity of 
notation, we summarize the algorithm here and refer the interested reader to the original 
paper for more details. This method involves simultaneously evolving an ensemble of K 
walkers S = {X^} where the proposal distribution for one walker k is based on the current 




(6) 
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Algorithm 1 The procedure for a single Metropolis-Hastings MCMC step. 

1: Draw a proposal Y Q{Y; X{t)) 

2: q ^ [p(Y) Q{X(t); Y)]/[p{X{t)) Q(Y; X{t))] // This line is generally expensive 

3: r ^ ~ [0, 1] 

4: if r < g then 

5: X{t+1)^Y 

6: else 

7: X{t+l)^X{t) 

8: end if 



positions of the K — 1 walkers in the complementary ensemble S[k] = {Xj, Vj ^ k}. Here, 
"position" refers to a vector in the A^-dimensional, real-valued parameter space. 

To update the position of a walker at position Xj^, a walker Xj is drawn randomly from 
the remaining walkers S^k] and a new position is proposed: 

Xk{t) -^Y = X^ + Z [X,{t) - X,] (7) 

where Z is a random variable drawn from a distribution g{Z = z). It is clear that if g 
satisfies 

g{z-^) = zg{z), (8) 

the proposal of Equation (7) is symmetric. In this case, the chain will satisfy detailed balance 
if the proposal is accepted with probability 

where is the dimension of the parameter space. This procedure is then repeated for each 
walker in the ensemble in series following the procedure shown in Algorithm 2. 

GWIO advocate a particular form of g{z), namely 

'1 



g{z) oc 



if z G 

z 



-,a 
a 



(10) 



otherwise 

where a is an adjustable scale parameter that GWIO set to 2. 



The parallel stretch move It is tempting to parallelize the stretch move algorithm by 
simultaneously advancing each walker based on the state of the ensemble instead of evolving 
the walkers in series. Unfortunately, this subtly violates detailed balance. Instead, we 
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Alf 


^orithm 2 A single stretch move update step from GWIO 


1: 


for k = 1, . . . , K do 


2: 


Draw a walker Xj at random from the complementary ensemble S[k] (t) 


3: 


z ^ Z g{z), Equation (10) 


4: 


Y <-Xj+z [Xkit) - Xj] 


5: 


q Z^~^ p{Y) / p{^X]^{ty) // This line is generally expensive 


6: 


r ^ i? ~ [0, 1] 


7: 


if r < g, Equation (9) then 


8: 


Xu{t + l)^Y 


9: 


else 


10: 


Xuit + I) ^ Xkit) 


11: 


end if 


12: 


end for 



must split the full ensemble into two subsets (5^°^ = {X^, V/c = 1, . . . ,K /2} and 5^^^ = 
{Xk, V/c = K/2 + 1, . . . , K}) and simultaneously update all the walkers in 5**^"^ — using the 
stretch move procedure from Algorithm 2 — based only on the positions of the walkers in 
the other set {S^^^). Then, using the new positions S^'^\ we can update S^^\ In this case, the 
outcome is a valid step for all of the walkers. The pseudocode for this procedure is shown 
in Algorithm 3. This code is similar to Algorithm 2 but now the computationally expensive 
inner loop (starting at line 2 in Algorithm 3) can be run in parallel. 

The performance of this method — quantified by the autocorrelation time — is compa- 
rable to the serial stretch move algorithm but the fact that one can now take advantage of 
generic parallelization makes it extremely powerful. 

3. Tests 

Judging the convergence and performance of an algorithm is a non-trival problem and 
there is a huge associated literature (see, for example, Cowles & Carlin 1996, for a review). In 
astrophysics, spectral methods have been used extensively (for example Dunkley etal. 2005). 
Below, we advocate for one such method: the autocorrelation time. The autocorrelation time 
is especially applicable because it is an affine invariant measure of the performance. 

First, however, we should take note of another extremely important measurement: the 
acceptance fraction a/. This is the fraction of proposed steps that are accepted. There 
appears to be no agreement on the optimal acceptance rate but it is clear that both extrema 
are unacceptable. If ct/ ~ 0, then nearly all proposed steps are rejected, so the chain 
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Algorithm 3 The parallel stretch move update step 
1: for i G {0, 1} do 
2: for A; = 1,...,A72 do 

3: // This loop can now be done in parallel for all k 

4: Draw a walker Xj at random from the complementary ensemble S'*-"*-* (t) 
5: Xfc ^ Sj^^ 

6: z ^ Z 9{.z), Equation (10) 

7: Y ^Xj + z [Xk{t) - Xj] 

8: Z^-^p{Y)/p{Xk{t)) 

9: r ^ ~ [0, 1] 
10: if r < g, Equation (9) then 
11: Xu{t+\)^Y 

12: else 

13: Xkit + \) ^ Xkit) 

14: end if 
15: end for 
16: t ^t+\ 
17: end for 



will have very few independent samples and the sampling will not be representative of the 
target density. Conversely, if a/- ~ 1 then nearly all steps are accepted and the chain is 
performing a random walk with no regard for the target density so this will also not produce 
representative samples. As a rule of thumb, the acceptance fraction should be between 0.2 
and 0.5 (for example, Gelman, Roberts, & Gilks 1996). For the M-H algorithm, these effects 
can generally be counterbalanced by decreasing (or increasing, respectively) the eigenvalues 
of the proposal distribution covariance. For the stretch move, the parameter a effectively 
controls the step size so it can be used to similar effect. In our tests, it has never been 
necessary to use a value of a other than 2, but we make no guarantee that this is the optimal 
value. 

Autocorrelation time The autocorrelation time is a direct measure of the number of 
evaluations of the posterior PDF required to produce independent samples of the target 
density. GWIO show that the stretch- move algorithm has a significantly shorter autocor- 
relation time on several non-trivial densities. This means that fewer PDF computations 
are required — compared to a M-H sampler — to produce the same number of independent 
samples. 
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The autocovariance function of a time series X{t) is 

Cf{T) = lim COY [/ {X{t + T))J {Xm . (11) 



t—^oo 



This measures the covariances between samples at a time lag T. The value of T where 
Cf{T) — measures the number of samples that must be taken in order to ensure indepen- 
dence. In particular, the relevant measure of sampler efficiency is the integrated autocorre- 
lation time 

>^ Cf(T) ^Cf(T) , , 

In practice, one can estimate Cf(T) for a Markov chain of M samples as 

Cf{T) ^ j^-^ Yl [/(^(^ + "^)) - (/)] [/(^H) - (/)] • (13) 

m=l 



We advocate for the autocorrelation time as a measure of sampler performance for two 
main reasons. First, it measures a quantity that we are actually interested in when sampling 
in practice. The longer the autocorrelation time, the more samples that we must generate 
to produce a representative sampling of the target density. Second, the autocorrelation time 
is affine invariant. Therefore, it is reasonable to measure the performance and diagnose the 
convergence of the sampler on densities with different levels of anisotropy. 

emcee can optionally calculate the autocorrelation time using the Python module acor^ 
to estimate the autocorrelation time. This module is a direct port of the original algorithm 
(described by GWIO) and implemented by those authors in C++.^ 



4. Discussion &: Tips 



The goal of this project has been to make a sampler that is a useful tool for a large 
class of data analysis problems — a "hammer" if you will. If development of statistical and 
data-analysis understanding is the key goal, a user who is new to MCMC benefits enor- 
mously by writing her or his own Metropolis-Hastings code (Algorithm 1) from scratch be- 
fore downloading emcee. For typical problems, the emcee package will perform better than 
any home-built M-H code (for all the reasons given above), but the intuitions developed by 



^http : / / github . com/df m/ acor 

■^http : / / www . math . nyu . edu/f acuity/ goodman/ sof tware/acor 
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writing and tuning a self-built MCMC code cannot be replaced by reading this document 
and running this pre-built package. That said, once those intuitions are developed, it makes 
sense to switch to emcee or a similarly well engineered piece of code for performance on large 
problems. 

Ensemble samplers like emcee require some thought for initialization. One general ap- 
proach is to start the walkers at a sampling of the prior or spread out over a reasonable 
range in parameter space. Another general approach is to start the walkers in a very tight 
A^- dimensional ball in parameter space around one point that is expected to be close to the 
maximum probability point. The first is more objective but, in practice, we find that the 
latter is much more effective if there is any chance of walkers getting stuck in low probability 
modes of a multi-modal probability landscape. The walkers initialized in the small ball will 
expand out to fill the relevant parts of parameter space in just a few autocorrelation times. 
A third approach would be to start from a sampling of the prior, and go through a "burn-in" 
phase in which the prior is transformed continuously into the posterior by increasing the 
"temperature." Discussion of this kind of annealing is beyond the scope of this document. 

It is our present view that autocorrelation time is the best indicator of MCMC perfor- 
mance (the shorter the better), but there are several proxies. The easiest and simplest indi- 
cator that things are going well is the acceptance fraction; it should be in the 0.2 to 0.5 range 
(there are theorems about this for specific problems; for example Gelman, Roberts, & Gilks 
1996). In principle, if the acceptance fraction is too low, you can raise it by decreasing the a 
parameter; and if it is too high, you can reduce it by increasing the a parameter. However, 
in practice, we find that a = 2 is good in essentially all situations. That means that when 
using emcee if the acceptance fraction is getting very low, something is going very wrong. 
Typically a low acceptance fraction means that the posterior probability is multi-modal, 
with the modes separated by wide, low probability "valleys." In situations like these, the 
best idea (though expensive of human time) is to split the space into disjoint single-mode 
regions and sample each one independently, combining the independently sampled regions 
"properly" (also expensive, and beyond the scope of this document) at the end. In previous 
work, we have advocated clustering methods to remove multiple modes (Hou etal. 2011). 
These work well when the different modes have very different posterior probabilities. 

Another proxy for short autocorrelation time is large expected or mean squared jump 
distance (ESJD; Pasarica & Gelman 2010). The higher the ESJD the better; if walkers move 
(in the mean) a large distance per chain step then the autocorrelation time will tend to be 
shorter. The ESJD is not an affine-invariant measure of performance, and it doesn't have a 
trivial interpretation in terms of independent samples, so we prefer the autocorrelation time 
in principle. In practice, however, because the ESJD is a simple expectation value it can be 
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more robustly evaluated on short chains. 

With emcee you want (in general) to run with a large number of walkers, like hundreds. 
In principle, there is no reason not to go large when it comes to walker number, until 
you hit performance issues. Although each step takes twice as much compute time if you 
double the number of walkers, it also returns to you twice as many independent samples per 
autocorrelation time. So go large. In particular, we have found that — in almost all cases of 
low acceptance fraction — increasing the number of walkers improves the acceptance fraction. 
The one disadvantage of having large numbers of walkers is that the burn-in phase (from 
initial conditions to reasonable sampling) can be slow; burn-in time is a few autocorrelation 
times; the total run time for burn-in scales with the number of walkers. These considerations, 
all taken together, suggest using the smallest number of walkers for which the acceptance 
fraction during burn-in is good, or the number of samples you want out at the end (see 
below), whichever is greater. A more ambitious project would be to increase the number 
of walkers after burn-in; this requires thought beyond the scope of this document; it can 
be accomplished by burning in a set of small ensembles and then merging them into a big 
ensemble for the final run. 

One mistake many users of MCMC methods make is to take too many samples! If all 
you want your MCMC to do is produce one- or two-dimensional error bars on two or three 
parameters, then you only need dozens of independent samples. With ensemble sampling, 
you get this from a single snapshot or single timestep, provided that you are using dozens 
of walkers (and we would recommend that you use hundreds in most applications). The 
key point is that you should run the sampler for a few (say 10) autocorrelation times. Once 
you have run that long, no matter how you initialized the walkers, the set of walkers you 
obtain at the end should be an independent set of samples from the distribution, of which 
you rarely need many. 

Another common mistake, of course, is to run the sampler for too few steps. You can 
identify that you haven't run for enough steps in a couple of ways: If you plot the parameter 
values in the ensemble as a function of step number, you will see large-scale variations over 
the full run length if you have gone less than an autocorrelation time. You will also see 
that if you try to measure the autocorrelation time (with, say, acor), it will give you a time 
that is always a significant fraction of your run time; it is only when the correlation time 
is much shorter (say by a factor of 10) than your run time that you are sure to have run 
long enough. The danger of both of these methods — an unavoidable danger at present — is 
that you can have a huge dynamic range in contributions to the autocorrelation time; you 
might think it is 30 when in fact it is 30 000, but you don't "see" the 30 000 in a run that 
is only 300 steps long. There is not much you can do about this; it is generic when the 
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posterior is multimodal: The autocorrelation time within each mode can be short but the 
mode-mode migration time can be long. See above on low acceptance ratio; in general when 
your acceptance ratio gets low your autocorrelation time is very, very long. 

One significant limitation to the stretch move and moves like it is that they implicitly 
assume that the parameters can be assembled into a vector-like object on which linear 
operations can be performed. This is not (trivially) true for parameters that have non-trivial 
constraints, like parameters that must be integer- valued or equivalent, or parameters that 
are subject to deterministic non-linear constraints. Sometimes these issues can be avoided by 
reparameterization, but in some cases, samplers like emcee will not be useful, or might require 
clever or interesting improvements. The emcee package is open-source software; please push 
us changes! 

It is a pleasure to thank Jo Bovy (IAS), Andrew Gelman (Columbia), Fengji Hou (NYU), 
and Larry Widrow (Queen's) for helpful contributions to the ideas and code presented here. 
We would also like to thank Eric Agol (UW) for pointing out an important typo in the first 
version of this document, emcee makes use of the open-source Python numpy package. 
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A. Installation 

The easiest way to install emcee is using pip^. Running the command 

7„ pip install emcee 

at the command line of a UNIX-based system will install the package in your Python path. 
If you would like to install for all users, you might need to run the above command with 
superuser permissions. In order to use emcee, you must also have numpy^ installed (this can 
also be achieved using pip on most systems), emcee has been tested with Python 2.7 and 
numpy 1.6 but it is likely to work with earlier versions of both of these as well. 

An alternative installation method is to download the source code from http : / / danf m . ca/ emcee 
and run 

7o python setup. py install 

in the unzipped directory. Make sure that you have numpy installed in this well. 

B. Issues & Contributions 

The development of emcee is being coordinated on GitHub at http : //github . com/df m/emcee 
and contributions are welcome. If you encounter any problems with the code, please report 
them at http://github.coni/dfni/emcee/issues and consider contributing a patch. 

C. Code Structure 

The main functionality of this sampler is contained in the EnsembleSampler object (de- 
fined in emcee/ensemble. py) and in the abstract base Sampler object (defined in emcee/sam- 
pler. py). Each EnsembleSampler instance has two Ensemble members that provide the pro- 
posal steps following Algorithm 3. The detailed structure of the code is shown in Appendix E 
or online at http://danfm.ca/emcee. 



''http : //pypi .python, org/pypi/pip 
■^http : //numpy . scipy . org 
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D. Examples 



The best way to get started with the package is to try out some of this sample code. 
Most problems with map nicely to one of these examples and if not, there are more details 
given in Appendix E. We provide to examples below: (1) the classic multivariate Gaussian 
density and (2) a very anisotropic density in 2 dimensions. 



A very simple sample problem and a common unit test for samplers is the performance 
of the code on a high- dimensional Gaussian density 



We will incrementally work through this example here and the source code is available 
online.^ 

First, import the necessary packages and define the function that calculates the loga- 
rithm of Equation (Dl). 

import numpy as np 
import emcee 

def InprobfnCx, mu , icov): 
diff = x-mu 

return -np.dot(diff ,np.dot(icov,diff))/2.0 

Next, choose the number of dimensions to use and generate the mean vector means and the 
positive definite covariance matrix cov and its inverse icov. 

ndim = 10 

means = np . random . rand (ndim) 

cov = . 5-np . random . rand (ndim**2) . reshape ( (ndim , ndim)) 

cov = np.triu(cov) 

cov += cov.T - np . diag ( cov . diagonal () ) 

cov = np . dot ( cov , cov) 

icov = np . linalg . inv ( cov) 



http : / / danf m . ca/ emcee/ quickstart 



D.l. The multivariate Gaussian 




(Dl) 
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The number of walkers to use is one of the tuning parameters of this sampler and we'll come 
back to some discussion of how to choose this shortly. For now, use 100 walkers and generate 
an initial guess for the position of each of the walkers (in a small ball centered at zero mean). 

nwalkers = 100 

pO = [np . random . rand ( ndim ) for i in xrange ( nwalkers ) ] 

Initialize the sampler object passing the chosen hyperparameters and the generated likelihood 
function and its arguments. 

sampler = emcee . EnsembleSampler (nwalkers , ndim, Inprobf n , 

args=[means, icov] ) 

Now, run a "burn-in" chain of 500 steps. Note that each walker is advanced 500 times here 
so Inprobfn is actually be called 5 x 10"^ times. 

pos , prob , state = sampler . run_mcmc (pO , 500) 

Finally, reset the chain and sample 2000 steps starting from the final state of the burn-in 
chain. 

sampler. reset () 

sampler . run_mcmc (pos , 2000, rst at e0= state ) 

From here, you can begin the post-processing procedure. To plot a histogram of the samples 
for a particular parameter i, you can get a list of the samples using 

samples = sampler . flatchain [:, i] 

samples will now be a numpy array with one entry for each step in the chain. The acceptance 
fractions (one for each walker in the ensemble) is given by 

af = sampler . acceptance_fraction 

and the overall acceptance fraction is af .mean(). If you install the acor package^, then the 
autocorrelation time of the chain is given by 

tau = sampler. acor 

tau is now a list with one entry for each parameter (the integrated autocorrelation time for 
that parameter). 



' iittp : // github . com/df m/ acor 
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D.2. The Rosenbrock density 

Despite tlie fact tliat it is only two-dimensional, the Rosenbrock density 

p(.Y)c.exp(-°''-^--^"f^+'^-^')') (D2) 

poses severe problems for all MCMC samplers because of its anisotropy. 

The following example shows how one might sample this density with some more ad- 
vanced techniques. First, we will use an object-oriented approach for defining the likelihood 
function. This seems slightly artificial in this context but in more complicated problems 
where the likelihood calculation requires significant amounts of structured data, this ap- 
proach can be quite beneficial. Also, we use the multiprocessing capabilities of this paral- 
lelized algorithm. Again, the full source code is available online.^ 

First, define the density. 

import numpy as np 
import emcee 

class Ro senbr o ck ( ob j ect ) : 
def __init__ ( self ) : 
self.al = 100.0 
self.a2 = 20.0 



def __call__ ( self , p): 

return -(self.al * (p [1] -p [0] **2) **2 + (1 -p [0] ) **2) /self . a2 

Then make an initial guess for the optimal position and set up the sampler (using 10 CPUs 
to compute the likelihoods in parallel). 

nwalkers = 100 

pO = np . random . rand (iiwalkers*2) . reshape (nwalkers , 2) 
rosenbrock = RosenbrockO 

sampler = emcee . EnsembleSampler (nwalkers , 2, rosenbrock, threads=10) 

Finally, sample 500 steps and write each step to a file. We would recommend writing 
atomically to a binary format (for example, HDF5 or FITS) in practice but this shows one 
possible workflow for the output. 

f = open (" rosenbrock . out " , "w") 

for pos , Inprob , rstate in sampler . sample (pO , iterations=500) : 



http : //danfm. ca/ emcee/rosenbrock 
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# Write the current position to a file, one line per walker 
f . wr ite ( " \n " . j oin ( [ " \ t " . j o in ( [ St r (q) for q in p] ) for p in pos])) 
f .write("\n") 
f . close () 
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E. API 

The detailed structure of the API with all the available options is given below and the 
human- readable annotated source code is available online^. 

emcee. Sampler(clim, Inprobfn, args=[]) 

An abstract sampler object that implements various helper functions 
Arguments 

dim (int): Number of dimensions in the parameter space. 

Inpostfn (callable): A function that takes a vector in the parameter space as input and returns the natural 
logarithm of the posterior probability for that position. 

Keyword Arguments 

args (list): Optional list of extra arguments for Inpostfn. Inpostfn will be called with the sequence lnpostfn(p, 
*args). 

Properties 

acceptance_fraction: An array (length: l<) of the fraction of steps accepted for each walker. 

acor: The autocorrelation time of each parameter in the chain (length: dim) as estimated by the acor module. 

chain: A pointer to the Markov chain itself. The shape of this array is (k, dim, iterations). 

flatchain: A shortcut for accessing chain flattened along the zeroth (walker) axis. 

Inprobability: A pointer to the matrix of the value of Inprobfn produced at each step for each walker. The 
shape is (k, iterations). 

random_state: The state of the internal random number generator. In practice, it's the result of calling 
get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned 
that if you do this and it fails, it will do so silently. 

Methods 

emcee. Sampler . clear_chain() 

An alias for reset kept for backwards compatibility. 

emcee . Sampler . get_lnprob(p) 



^http : //danfm. ca/ emcee/#src 
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Return the log-probability at the given position. 

emcee . Sampler . reset() 

Clear chain, Inprobability and the bookkeeping parameters. 

emcee. Sampler. run_mcmc(posO, N, rstateO=None, lnprobO=None) 

Iterate sample for N iterations and return the result. The arguments are passed directly to sample so 
see the parameter details given in sample. 

emcee. EnsembleSampler(k, *args, **kwargs) 

A generalized Ensemble sampler that uses 2 ensembles for parallelization. 

This is a subclass of the Sampler object. See the Sampler object for more details about the inherited properties. 
Arguments 

k (int): The number of Goodman & Weare "walkers", 
dim (int): Number of dimensions in the parameter space. 

Inpostfn (callable): A function that takes a vector in the parameter space as input and returns the natural 
logarithm of the posterior probability for that position. 

Keyword Arguments 

a (float): The proposal scale parameter, (default: 2.0) 

args (list): Optional list of extra arguments for Inpostfn. Inpostfn will be called with the sequence lnpostfn(p, 
*args). 

postargs (list): Alias of args for backwards compatibility. 

threads (int): The number of threads to use for parallelization. If threads == 1, then the multiprocessing 
module is not used but if threads > 1, then a Pool object is created and calls to Inpostfn are run in parallel, 
(default: 1) 

pool (multiprocessing. Pool): An alternative method of using the parallelized algorithm. If pool is not None, the 
value of threads is ignored and the provided Pool is used for all parallelization. (default: None) 

Exceptions 

Assertion Error: If k < 2*dim or if k is not even. 

Properties 

acceptance_fraction: An array (length: k) of the fraction of steps accepted for each walker. 

acor: The autocorrelation time of each parameter in the chain (length: dim) as estimated by the acor module. 

chain: A pointer to the Markov chain itself. The shape of this array is (k, dim, iterations). 
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flatchain: A shortcut for accessing chain flattened along the zeroth (walker) axis. 

Inprobability: A pointer to the matrix of the value of Inprobfn produced at each step for each walker. The 
shape is (k, iterations). 

random_state: The state of the internal random number generator. In practice, it's the result of calling 
get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned 
that if you do this and it fails, it will do so silently. 

Methods 

emcee. EnsembleSam pier . clear_chain() 

An alias for reset kept for backwards compatibility. 

emcee . EnsembleSampler . get_lnprob(p) 

Return the log-probability at the given position. 

emcee. EnsembleSampler . reset() 

Clear chain, Inprobability and the bookkeeping parameters. 

emcee. EnsembleSampler. mn_mcmc(posO, N, rstateO=None, lnprobO=None) 

Iterate sample for N iterations and return the result. The arguments are passed directly to sample so 
see the parameter details given in sample. 

emcee . EnsembleSampler . sample(pO, lnprobO=None, rstateO=None, iterations=l, **kwargs) 
Advance the chain iterations steps as an iterator. 
Arguments 

posO (numpy.ndarray): A list of the initial positions of the walkers in the parameter space. The shape 
is (k, dim). 

Keyword Arguments 

InprobO (numpy.ndarray): The list of log posterior probabilities for the walkers at positions given by 
pO. If Inprob is None, the initial values are calculated. The shape is (k, dim). 

rstateO (tuple): The state of the random number generator. See the Sampler. random_state property 
for details. 

iterations (int): The number of steps to run. (default: 1) 
Yields 

pos (numpy.ndarray): A list of the current positions of the walkers in the parameter space. The shape 
is (k, dim). 

Inprob (numpy.ndarray): The list of log posterior probabilities for the walkers at positions given by 
pos. The shape is (k, dim). 

rstate (tuple): The state of the random number generator. 
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emcee . E n se m b I e (sa m pier) 

A sub-ensemble object that actually does the heavy lifting of the likelihood calculations and proposals of a new 
position. 

Arguments 

sampler (Sampler): The sampler object that this sub-ensemble should be connected to. 
Methods 

emcee. Ensemble . get_lnprob(pos=None) 

Calculate the vector of log-probability for the walkers. 
Keyword Arguments 

pes (numpy.ndarray): The position vector in parameter space where the probability should be calcu- 
lated. This defaults to the current position unless a different one is provided. 

Returns 

Inprob (numpy.ndarray): A vector of log-probabilities with one entry for each walker in this sub- 
ensemble. 

emcee. Ensemble . propose_position(ensemble) 

Propose a new position for another ensemble given the current positions 
Arguments 

ensemble (Ensemble): The ensemble to be advanced. 
Returns 

q (numpy.array): The new proposed positions for the walkers in ensemble. 

newlnprob (numpy.ndarray): The vector of log-probabilities at the positions given by q. 

accept (numpy.ndarray): A vector of bools indicating whether or not the proposed position for each 
walker should be accepted. 



